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ABSTRACT 

In the use of non-antibody proteins as affinity 
reagents, diversity has generally been derived 
from oligonucieotide-encoded random amino acids. 
Although specific binders of high-affinity have been 
selected from such libraries, random oligonuc- 
leotides often encode stop codons and amino acid 
combinations that affect protein folding. Recently it 
has been shown that specific antibody binding loops 
grafted into heterologous proteins can confer the 
specific antibody binding activity to the created 
chimeric protein. In this paper, we examine the use 
of such antibody binding loops as diversity ele- 
ments. We first show that we are able to graft a 
lysozyme-binding antibody loop into green fluores- 
cent protein (GFP), creating a fluorescent protein 
with lysozyme-binding activity. Subsequently we 
have developed a PCR method to harvest random 
binding loops from antibodies and insert them at 
predefined sites in any protein, using GFP as an 
example. The majority of such GFP chimeras remain 
fluorescent, indicating that binding loops do not 
disrupt folding. This method can be adapted to the 
creation of other nucleic acid libraries where divers- 
ity is flanked by regions of relative sequence 
conservation, and its availability sets the stage for 
the use of antibody loop libraries as diversity 
elements for selection experiments. 



INTRODUCTION 

It is believed that a new suite of technologies, generically 
termed the 'display' technologies will overcome many of 
the disadvantages associated with the generation of antibodies 
by immunization. In particular, they avoid animals, provide 
monoclonal reagents and since genes are cloned simultan- 
eously with selection, can be easily manipulated to provide 
novel downstream reagents with additional properties. 



Although antibody fragments were originally most com- 
monly used as scaffolds, many other proteins have also been 
used successfully (1,2), with the most widely pursued being 
single domains based on the immunoglobulin fold: e.g. single 
VH (3) or VL (4) chains, camel VHH domains (5), CTLA4 (6) 
or fibronectin (7) domains. In general these tend to be 
relatively well expressed (1-10 mg/1) with affinities in the 
nanomolar range, although expression in intracellular com- 
partments can be difficult due to the presence of disulfide 
bonds. Beyond immunoglobulin domains, nanomolar binders 
have also been selected from libraries based on a three helix 
bundle domain from protein A [Affibodies (8,9)], Iipocalins 
[termed anticalins (10,11)], cysteine rich domains (12) and 
ankyrins [termed DARPINS (13,14)], with X-ray crystallo- 
graphy (13,15) of anticalins and ankyrins showing that the 
mutated residues undergo structural changes, when compared 
to the parent molecule, to accomodate binding. 

Transformation of a protein into a binding scaffold requires 
the introduction of diversity at the site targeted to become the 
binding site. This has generally been either replacement 
diversity (3—6,8—1 1,13) — where amino acids present in the 
scaffold of interest, within the chosen loops or surfaces, are 
randomized — or insertional diversity, where a specific inser- 
tional site is chosen and stretches of random amino acids are 
inserted. The latter has been earned out both in antibody 
binding loops (16-19) and other proteins (20-24), with 
diversity derived from random peptides encoded by degener- 
ate oligonucleotides or in rare cases by trinucleotide codons 
(25). Recently, antibodies with high affinities have also 
been selected from libraries where the introduced comple- 
mentarity determining region (CDR) diversity is limited to 
only four (tyrosine, alanine, aspartate and serine) (26) or 
two (tyrosine and serine) (27) different amino acids at spe- 
cific sites in multiple CDRs. 

Nature provides a potential source of functional and well 
folding binding elements in the form of the binding loops 
which make up the antibody binding site. Antibodies contain 
six such binding loops, termed CDRs, which are involved in 
forming the antibody binding site. The first and second CDRs 
in both light and heavy chains are encoded by the germline V 
genes and subsequent mutation, while CDR3 is created as a 
result of recombination between V and J genes in the case 
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of the light chain, and V, D and J genes for the heavy chain 
(28,29). Further diversity is created by the addition and loss 
of nucleotides at the junctions between the recombined 
gene segments (30,31) and somatic hypermutation (32). 
Structurally, each class of CDRs is similar in size and struc- 
ture, with each adopting one or a few distinct or 'canonical' 
conformations (33-35). HCDR3 is an exception, showing 
wide variations in length, structure, shape and sequence 
(36,37), as well as intrinsic conformational diversity 
(38-40), reflecting the importance of HCDR3s in antibody 
binding specificity (41,42). Given this data, and the fact 
that HCDR3s also contain very few stop codons, they appear 
to represent a very effective form of diversity. This conclu- 
sion is bolstered by the structural conservation found at the 
ends of HCDR3s, revealed by the finding that the four 
N-terminal and six C-terminal residues from different 
HCDR3 regions demonstrate <2.75 A r.m.s.d for >99.7% of 
all pair- wise comparisons examined (37). As a result, 
HCDR3s would be expected to be less disruptive to protein 
structure than random peptides of the same length. Further- 
more, if a scaffold is able to accept a single HCDR3 at a spe- 
cific site, it is likely that many different HCDR3s can also be 
accommodated at that same site. 

Although libraries of HCDR3s have never been assessed for 
their effects on protein structure, a number of examples of the 
use of specific antibody CDRs as diversity elements able to 
transplant binding activity to a heterologous protein have 
been described. An HCDR3 from an integrin binding antibody 
has been inserted into an exposed loop in tissue-type plasmino- 
gen activator, conferring integrin binding activity to 
the plasminogen activator, without eliminating its normal 
enzymatic function (43). Similarly, a CDR3 from a camelid 
VHH recognizing lysozyme has been transplanted to neocar- 
zinostatin, a bacterial chromoprotein with a beta sheet structure, 
allowing the chimeric molecule to recognize lysozyme (44). As 
camelid VHH CDR3s are very similar to traditional antibody 
HCDR3s, these two examples indicate the potential for using 
libraries of HCDR3s as diversity /binding elements, if means 
for harvesting that diversity can be developed. More recently, 
an HCDR1 loop from a CD4 binding antibody was inserted 
into three exposed loops of the protein inhibitor of neuronal 
nitric oxide synthase and each construct was shown to exhibit 
CD4 binding (45). This work was based on earlier work show- 
ing that peptides derived from five of the six CDRs of the anti- 
CD4 antibody, and not other regions of the variable region were 
able to bind CD4 as soluble, circularized peptides (46). 

While these experiments show that in these specific 
examples, HCDR3s can be inserted into heterologous pro- 
teins without disruption of protein function, it does not dem- 
onstrate that this can be carried out generally. 

In this paper we explore the possibility of using HCDR3s as 
a source of insertional diversity. Using the green fluorescent 
protein (GFP), which is not fluorescent unless correctly folded 
(47), as a reporter protein, we first show that the VHH CDR3 
described above is also functional when inserted into two sites 
in GFP. Subsequently we describe a novel PCR method to 



Page 2 of 15 



harvest HCDR3 diversity, based on the fact that the N~ and 
C-terminal HCDR3 amino acids (CXX...XXWG) are 
extremely well conserved at the DNA, protein and structural 
levels. We examine the effects of inserting antibody binding 
loops amplified using this method into GFP, and show that 
for most sites, and most HCDR3s, there is relatively little dis- 
ruption to GFP function, validating HCDR3s as a potential 
source of diversity. These experiments set the stage for further 
exploration of the use of HCDR3s as diversity elements in a 
variety of different scaffold proteins. 

MATERIALS AND METHODS 

pET-CK3 expression vector construction 

Four Bpml sites, one SphI and BssHII site were eliminated 
from pET-C6His (48), a pET-28 derivative. The SphI site 
was eliminated by digesting pETC6-His with SphI. The linear 
DNA fragment was treated with T4 DNA polymerase and 
re-ligated with T4 DNA ligase. The ligation was digested 
with SphI and transformed into DH5aFT cells. The Bpml 
sites and the BssHII site were mutated using the Stratagene 
mutagenesis kit (Stratagene, La Jolla, CA) according to the 
manufacturer's recommendation. Briefly, 100 ng of pETC- 
His6-ASphI template DNA was amplified in a 25 jliI reaction 
using 1 uM of the primers indicated in Table 1. A total of 1 ^1 
of dNTP mix, 0.75 ul of QuikSolution and 1 |il of 
QuikChange® Multi enzyme blend with the following tem- 
perature cycle: 95°C for I min followed by 30 cycles of 
95°C for 1 min, 55°C for 1 min, 65°C for 10 min. The 
PCR product was digest with Dpnl, Bpml and BssHII for 
I h and the mixture was transformed into XL- 10 Gold® 
ultracompetent cells. The resulting pET-CK3 vector was 
checked by restriction mapping and sequencing. 

SacB insertion into the GFP loops 

SacB is a negative selectable marker, which can be used to 
kill bacteria bearing by growth on sucrose (49). The SacB 
gene was inserted into superfolder GFP (50) at each of the 
different identified loop sites (Table 2) in such a way that it 
was flanked by two type Bpml restriction sites. These allowed 
the removal of the sacB gene and the creation of an appropri- 
ate cloning site for CDR3 sequences, which were also flanked 
with compatible Bpml sites. After Bpml cleavage, the N and 
C portions of a generic CDR3 were exposed, allowing the 
reassembly of a full CDR3 after ligation of amplified 
CDR3 inserts (see Figure 1). Since the sacB gene disrupts 
the GFP coding sequence, clones are not fluorescent unless 
permissive CDR3s have been inserted. These vectors were 
created by amplifying the full pET-CK3-sfGFP plasmid 
using pairs of primers flanking each insertion site. This cre- 
ated the following structure (illustrated for the insertion at 
loop 2), with the portion in green corresponding to GFP, 
the portion in red representing the primer encoded sequences 
which complement the cloned HCDR3, and the underlined 
bases the cleavage site for the indicated Bpml sites: 



F K D £ G D G 

5' . . TTC 7\hh GAT TCTGGC GAG GAA TAC TAA CTC CAG AGT AGA CCC TAA TGA TGA G CT GGA G CC TAA AGA CCC GGG GGC GAC GGG . . 
3' . . AAG TTT CTA AGA CCG CTC CTT ATG ATT GAG GTC TCA TCT GGG AGG ACT ACT CGA CCT CGG ATT TCT GGG CCC CCG CTG CCC . . 

Bpml Bpml 
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Table 1. The sequences of the primers used to create the different recipient GFP vectors 


Loop 1 (23/24) and loop la (22/24) 




T A ATG ATG AG C TGG AG C C TAAAG AC C CGGGGGCG GG C AC AA ATTTT C TGTC AG AGO AG 


pDAN5-GFP-loopl-3' 


GGGTCTACTCTGGAGTTAGTATTCCTCGCCAGAATTAACATCACCATCTAATTCAACAAG 


pDAN5-GFP-loop 1 .4-5' 


GGGTCTACTCTGGAGTTAGTATTCCTCGCCAGAAACA f L : CACCA ? rC : :rAATTCAACAAG 


pDAN5-GFP-loop1a.4-5' 


Loop 2 (102/103) and loop 2a (101/102) 




TAATGATGAGCTGGAGCCTAAAGACCCGGGGGCGACGGGACCTACAAGACGCGTGCTG 


pDAN5-GFP-loop2-3' 


GGGTCTACTCTGGAGTTAGTATTCCTCGCCAGAAa , C , r ; riX^AAAGAlA ; rAGH;'GCGT f rC 


P DAN5-GFP-loop2.4-5' 


TAATGATGAGCTGGAGCCTAAAGACCCGGGGGCGATGACGGGACCTACAAGACGCGTGCTG 


P DAN5-GFP-loop2a-3' 


GGGTCTACTCTGGAGTTAGTATTCCTCGCCA(5ATTTG.AAAGA'rATAGTGCGTTC 


pDAN5-GFP-loop2a.4-5' 


Loop 3 (173/174) and loop 3a (172/173) 




TAATGATGAGCTGGAGCCTAAAGACCCGGGGGCGGTTCCGTTCAACTAGCAGACCAT 


pDAN5~GFP-loop3-3' 


GGGTCTACTCTGGAGTTAGTATTCCTCGCCAGAArO^r'IC'AAC^^TTCTGGCGAATT'rTG 


P DAN5-GFP-loop3.4-5' 


TAATGATGAGCTGGAGCCTAAAGACCCGGGGGCGATGGTTCCGTTCAACTAGCAGACCAT 


P DAN5-GFP-loop3a-3' 


GGGTCTACTCTGGAGTTAGTATTCCTCGCCA(iAl ; *:rCAACGl ; TGTG<iCGAA ; rTTTG 


P DAN5-GFP-Ioop3a.4-5' 


Loop 4 (213/214) 


TAATGATGAGCTGGAGCCTAAAGACCCGGGGGCAAGCG f rGACCACATGG'rCCTTC'rT 


pDAN5-GFP-loop4-3' 


GGGTCTACTCTGGAGTTAGTATTCCTCGCCAGATTCGT ? rGGGATCl ; i: ,: rCGAAAGGACAG 


pDAN5-GFP-loop4.4-5 / 


Loop 5 (51/52) 




TAATGATGAGCTGGAGCCTAAAGACCCGGGGGCAAACTAC^CTGl'TCCA'rGOCCAAClACT-TG 


P DAN5-GFP-loop5-3' 


GGG TC T AC TC TGG AGTT AGT ATT C C TCGC C AG A 'PC C AGT AG TGC AAAT A A ATTT A AG GGTG AG 


pDAN5-GFP-loop5-5' 


SacB primers 




C-GGGGGTCTGGCGAGGAATACTAACTCCAGTTTTTAACCCATCACATATACCTGCCGTTCAC 


SacB.2-5' 


GGGGGAACCGCCCCCGGGTCTTTAGGCTCCAGCCGCTTCTCAACCCGGTACGCACCAG 


SacB-3' 


Restriction enzyme mutation primers 




GCTCGTTGAGTTTCTCAAGAAGCGTTAATGTCTGGC 


Bpml 2621 


CGATCATCGTCGCGCTCAAGCGAAAGCGGTCC 


Bpml 3288 


GACATGGCACTCCAATCGCCTTCCCGTTCCGC 


Bpml 3922 


GC GTG C AG GG C C AG AC TAG AGGTGGC AACGC C 


Bpml 4411 


GACTCGGTAATGGCACGCATTGCGCCCAGC 


BssHII 3288 



The portion in green corresponds to GFP and the portion in red, the conserved portion of the HCDR3. Underlined bases represent restriction sites (Bpml and Smal). 



For each of the different loops, the primers in Table 1 were 
used. These PCR products were cleaved with Bpml and lig- 
ated to SacB amplified with sacB.2-5' and sacB-3' also 
cleaved with Bpml (Table 1). These SacB primers placed 
Bpml sites at equivalent positions, allowing the SacB gene 
to be removed by cleavage with Bpml in the ligated clone. 
After cloning, bacteria were tested for their inability to 
grow on both liquid and agar media containing 2-5% sucrose, 
as well as by restriction digestion. 

HCDR3 amplification 

Total RNA was prepared from 40 different samples of human 
peripheral blood lymphocytes purified by Ficoll Hypaque 
(Amersham Pharmacia Biotech, UK). Pathogens were deemed 
to be inactivated by the use of Trizol to purify RNA. This work 
was earned out under the auspices of the LANL IRB. cDNA 
was synthesized using random hexamers and reverse tran- 
scriptase following standard protocols. HCDR3s were ampli- 
fied by nested PCR using the IgM for forward primer and a 
mixture of VH primers (4-6,10,12,14,22,51) with the follow- 
ing temperature cycle: 94°C. 60 s followed by 30 cycles of 
94°C, 30 s, 55°C, 30 s, 72°C, I min followed by 72°C for 
7 min. One microliter of the first PCR after gel purification 
was used as template in the second PCR. Biotinylated primers 
in Tables 2-5 were used to amplify the CDR3 sequences with 
the following temperature cycle: 94°C, 60 s followed by 
30 cycles of 94°C, 30 s, 50°C, 30 s, 72°C, 1 min followed 
by 72°C for 7 min. The PCR product was phenol/chloroform 
extracted and ethanol precipitated. It was dissolved in 90 |il of 
water and digested with 50 U of Bpml for 2 h. The enzyme 



was heat inactivated at 65°C for 20 min. A total of 100 \i\ of 
M-280 Streptavidin Dynabeads (Dynal, Norway) was washed 
three times with TE and the beads were resuspended in the 
Bpml digested PCR products. The beads were mixed at 
room temperature for 30 min and collected with a magnet. 
The supernatant, which contains digested HCDR3s was used 
directly in the ligation reactions. 

Library construction 

pET-CK3-sfGFP-SacB in eight different loops were digested 
with Bpml, treated with Antarctic phosphatase and gel puri- 
fied using the Qiagen gel purification kit. The concentration 
of the vector was measured by spectrofluorometer and liga- 
tions were set up with CDR3 fragment with a vector: insert 
ratio of 1:3 overnight at 4°C in 20 |Lil volume using 800 U 
of T4 DNA ligase (NEB). tRNA (1 jug) was added to the reac- 
tions and the total ethanol precipitated and redissolved in 
50 jil of water. A total of 2 ul of each of the ligation reactions 
were electroporated into BL21 (DE3) Gold (Novagen) cells 
and plated on nitrocellulose filters on Luria-Bertani (LB) 
plates containing 50 jig/ml kanamycin/2% glucose/2% suc- 
rose. Cells were grown overnight at 37°C. The filters were 
transferred onto kanamycin LB plates containing 1 fig/ml 
isopropyl-P-D-thiogalactopyranoside (IPTG) and induced for 
4 h at 30°C. 

Determination of c-lys affinity by flow cytometry 

Streptavidin coated beads (50 |lU) (Spherotec) were incubated 
with either 1 ng of biotinylated (Pierce Biotechnology) Iyso- 
zyme (Sigma) or 15 ng of biotinylated 9E10 (anti-myc, 
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(a) 



Loop insertion sites in GPP 
I 5 2 3 4 

I I I I 1 




Ligation * CDR3 inserted 
into specific site flanked by 
conserved MCDK3 bases 



I Bpml site 
H HCDR3 
I | HCDR3 overhangs 

(b) Loop 2 SacB vector 

K D £ G r, p S 

AM sat TQXJ$gc gag GAA tac TAA CTC <£A<3-3acB-GCT gga gcc taa aga ccc ggg ggc GAC 
TTT CTA CTC CTT ATG ATT GAG GTC CCA CCT CGG ATT TCT GGQ CCC ££|3 CTC CCC 

Bpml C Bpml R BprcT R Bpsil C 



After cleavage with Bpml and removal of SacB insert 



K P *3 
AAA SPiT 

rrr cr* m 



c- u G 



PGR product from BCDR amplification using VR35-I ,4 and JH1 .4-3* (B=biotin), showing the 
portions between the Bpml sites 

A T r V C A W Q K G T L 

B. .CT55AG GCC ACR TAT TAC TGT G&f ~ HCBE3- TGG CAG GGC AGC CTG S CTCCAG 

GACCTC CGG TGV ATA ATG ACftJBGH ACC CCG GTC CCG TGG GAC C GAGGTC . . £ 



After cleavage with Bpml 



Bpml. Cleavage Bpml. C 



A w si 

A CGM ACC 



After mixing with Bpml cleaved SacB vector 

k n 5 a v«' 9 s r> n 

AAA W TOUS CN-HCQK3-TGG G« C GAT ~GG 

rrr err ft ?■ cgn acc to ct<> "a 

Final cloned product after ligation 



F C £ A W $ p G 

^AA SfcT p£ IU3CN - H C PR3 - TGG &3tT SAC GGC 
Tin' GT-t CGH ACC CCD C.?* CCC 



Figure 1. (a) The genetic rearrangement which creates human VH genes, and the PCR strategy used to amplify, digest and purify HCDR3s is shown on the left, 
while on the right is shown the general scheme used to create a template for simple insertion of HCDR3s exploiting type Us restriction sites and a negative 
selectable marker, such as SacB. (b) The detailed sequence of a recipient vector containing SacB inserted into loop 2. and the cloning strategy used to insert 
HCDR3s. The letters depicted in green represent GFP sequences, black are HCDR3 sequences, and red are junctional sequences which come together during the 
ligation procedure. As a result of the cloning procedure, the conserved cysteine present in the HCDR3 is converted into a serine. Bpml R (underlined) identifies 
the Bpml recognition site, while Bpml C identifies the cleavage site. 



Upstate, New York) antibody in a final volume of 100 u.1 
phosphate-buffered saline (PBS), for 1 h at room temperature. 
A total of 100 u.1 of 5% BSA was added to block the bead 
surface and incubated for a further hour at room temperature. 
Beads were washed once in PBS and resuspended in 150 |il of 
PBS. GFP containing the anti-lysozyme CDR3 with an initial 



concentration of 0.6 mg/ml was 2-fold serially diluted and 5 
\x\ of antigen coated beads added to 50 uJ diluted protein per 
well. After 1 h incubation at room temperature the supernat- 
ant was removed by washing once in PBS and the beads were 
analyzed by flow cytometry using a FACSAria instrument 
(BD Biosciences). For the determination of affinity at each 
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dilution, the mean fluorescence intensity of the non-specific 
binding of c-lys to the beads coated with an irrelevant target 
was subtracted from the specific fluorescence of the lysozyme 
coated ones. The resulting fluorescence values at each dilu- 
tion were fitted to a logistic function using Origin (Microcal 
Software, Inc., Northampton, MA) and the affinity determ- 
ined as the concentration at which half maximal fluorescence 
was obtained (52). 

Flow cytometric analysis of bacterial fluorescence 

Bacterial libraries expressing GFP clones were inoculated 
in minimal medium (53) and grown overnight at 37°C. The 
following day 1 ml of autoinduction medium (53) was inocu- 
lated with 10 \i\ of each library and grown at 30°C overnight. 
The cells were diluted in PBS and analyzed using a BD LSR 
II flow cytometer (BD Biosciences), using the 488 nm laser to 
excite GFP). 

Protein expression, purification and characterization 

All plasmids were transformed into Escherichia coli BL 21 
Gold, plated on 2XTY/Kan/10% glucose and grown overnight 
at 37°C. Individual colonies were picked and grown overnight 
in liquid 2XTY/Kan/Glucose at 37°C. Confluent culture (1 ml) 

Table 2. GFP insertion sites 



Loop 




Amino acids 


Loop 




Amino acids 


1 


23/24 


VN/GH 


la 


22/24 


DV/N/GH 


5 


51/52 


TG/KL 








2 


102/103 


KD/DG 


2a 


101/102 


FK/DD 


3 


173/174 


ED/GS 


3a 


172/173 


IE/DG 


4 


213/214 


NE/KR 









The sites at which HCDR3 libraries were inserted into GFP are indicated. For 
loops 1 , 2 and 3, two different insertion points were used as shown in the table. 
In loop la the underlined asparagine was deleted and the HCDR3s inserted 
between the valine and the glycine. 



was used to inoculate 50 ml 2XTY/Kan/IPTG in 250 ml sha- 
ker flasks for expression at 30°C overnight. 

Proteins were purified by low-salt immobilized metal affin- 
ity chromatography. Cultures were harvested by centrifuga- 
tion, sonicated and resuspended in 10 mM Tris.HCl (pH 8.0) 
and recentrifuged at 3000 g for 30 min at 4°C. The supernatant 
was applied to IMAC columns pre-equilibrated with Tris for 
initial adhesion. The flow-through was reapplied three addi- 
tional times and washed with 20 bed volumes Tris. An addi- 
tional wash of 20 bed volumes of 10 mM Tris/300 mM 
NaCl/10% glycerol was performed preceding a final Tris 
wash before elution in 600 mM Imidazole. The buffers were 
exchanged from the eluted proteins using three passes of 
spin filtration with 10 000 MWCO Amicon Ultra centrifugal 
filtration devices at 4°C. The desalted proteins were diluted 
in prechilled Tris and stored at 4°C preceding further evalu- 
ation. Protein samples for SDS-PAGE comparison were 
diluted for equivalent fluorescence utilizing a Tecan Spectra- 
fiuor Plus plate fluorometer equipped with 485 nm excitation 
and 535 nm emission filters prior to standard denaturation 
and gel loading. 

Absorption spectra were collected on a ThermoSpectronic 
Genesys2 and exported to Microsoft Excel for comparison. 
Excitation and emission spectra were generated by either a 
QuantumMaster 6SE (Photon Technologies Incorporated; 
Edison, NJ) or a SPEX Fluorolog spectrofluorometer utilizing 
1 cm 2 cuvettes. Excitation scans were evaluated with 509 nm 
emission wavelength. Emission scans were generated with 488 
nm excitation. All emission scans were normalized to the 
maximum value obtained at the main emission peak for each 
sample. 

Surface plasmon resonance analysis 
of anti-lysozyme HCDR3 clones 

SPR analysis was performed on a BIAcore 2000, using a 
Streptavidin chip (purchased from BIAcore). Our in-house 



Table 3. Primer analysis 



Primer 



Sequence 



Degeneracy 



Rearranged V 



Germline V 



5' set 










VR35-1.4 


GCCACRTATTACTGTG 


2 


123 (72, 51) 


3 


VR35-2.4 


GCCATNTATTACTGTG 


4 


434 (107. 30. 258, 39) 


3 


VR35-3.4 


GCCGTHTATTACTGTG 


3 


577 (356, 148, 73) 


1 


VR35-4.4 


GCCTTGTATTACTGTG 


1 


66 


2 


VR35-5.4 


GCTGTHTATTACTGTG 


3 


391 (120, 162, 109) 


0 


VR35-6.4 


GCYGTGTATTACTGTG 


2 


2118 (974, 1144) 


35 


VR35-7.4 


GCYGTVTATTATTGTG 


6 


303 (33, 36, 95, 23, 49, 67) 


0 


VR35-8.4 


GCYGTNTATTTCTGTG 


8 


170 (17, 23, 37, 17, 8, 13,47,9) 


0 


Total 

3' set 






4182/5646 74% 


44/49 90% 


JH 1.4-3' 


CCAGGGTGCCCTGGCCCCA 


I 


170 


JH4, JH5 


JH2.4-3' 


CCAGGGTGCCACGGCCCCA 


1 


94 


JH6 


JH3.4-3' 


CCATTGTCCCTTGGCCCCA 


I 


487 


JH3 


JH4.4-3' 


CCAGGGTTCCCTGGCCCCA 


1 


1958 


JH1 


JH6.4-3' 


CCGTGGTCCCTTGGCCCCA 


1 


802 


JH2 


Total 






3511/5646 62% 


6/6 100% 



Sequences of those portions of the 5' and 3' primers corresponding to the VH or JH genes suitable for the amplification of HCDR3s are shown. These were analyzed 
by using only the portion of the primer which recognizes the V or J gene and searching against the database of 5646 rearranged V genes or the 49 germline V genes or 
6 JH genes. These analyses are stringent (100%), so it is likely that in real experimental situations, more sequences are likely to be amplified, as in each case the 
3' primer sequences are exiremely well conserved. Under 'Rearranged V\ I he total numbers of the 5646 rearranged VH genes downloaded from IMGT with 
absolute homology to each of the primers arc given. In brackets are given the number of VH genes recognized by each of the individual primers making up the 
degenerate pool. Under 'Germline V, the number of germline VH genes with 100% homology to the primers is given. 
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Table 4. Primer localization 



VH genes 

102 103 104 105 106 107 
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TAT TAC TGT GCG AGA GA 
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> primer set 

JH genes 

115 116 117 118 119 120 121 122 123 124 125 126 127 128 
FM QD W G QR G T MLT V T V S S 

JH1 TTC CAG CAC TGG GGC CAG GGC ACC CTG GTC ACC GTC TCC TCA G/ 



JH2 . . . G.T .T. ... .GT T / 

JH3 ..TG.TAT. . . . J _ L . ..A..G..AA / 

JH4 . .TG.CT A / 

JH5 . . . G.C .C. . . . ^ A / 

JH6 A.G G.C GT A . . G . . . AC / 



< primer set 



The sequences shown represent the full diversity found in the gennJine V genes 
centered around the conserved cysteine (TGT). All six JH sequences are 
indicated. For both VH and JH, the region of primer recognition is given, 
and the cut site when Bpml is used is underlined. The IMGT numbering system 
is used. 



Table 5. HCDR3 amplification primer sequences 



5' primer sets 

VP. 3 5-1.4 5'Biotin- AACGTG C TG G AG G C C AC RT ATT AC TG TG 
VR3 5-2 . 4 5'Biotin-AACGTGCTGGAGGCCATNTATTACTGTG 
VR3 5 - 3 . 4 5'B i o t in- AACGTGCTGG AGGCCGTHTATTACTGTG 
VR3 5-4 . 4 5'Biotin - AACGTGCTGG AGGCCTTGT ATT ACTGTG 
VR3 5-5 . 4 5'Biotin- AACGTGCTGGAGGCTGTHTATTACTGTG 
VR3 5-6.4 5'Bioti n - AACGTGCTGG AGGC YG TGT ATT AC TG TG 
VR3 5 - 7 . 4 5'Biotin - AACGTGCTGG AGGC YGTVTATTATTGTG 
VR35-8 . 45'Biot in- AACGTGCTGG AGGCYGTKTATTTCTGTG 
Z' primer sets 

JH1 . 4-3'5'Biotin-TGAGGAGACTGGAGCCAGGGTGCCCTGGCCCCA 
JH2 . 4-3'5'Biotin-TGAGGAGACTGGAGCCAGGGTGCCACGGCCCCA 
JH3 . 4-3 , 5 / Biotin-TGAAGAGACTGGAGCCATTGTCCCTTGGCCCCA 
JH4 . 4 -3' 5'Biotin- TGAGGAGACTGGAGCCAGGGTTCCCTGGCCCCA 
JH6 . 4 - 3' 5'Biotin -TGAGGAGACTGGAGCCGTGGTCCCTTGGCCCCA 



The 5' and 3' primer sequences used are shown. Each primer is biotinylated at 
the 5' end. The biotin is followed by four bases to assist in recognition and 
cleavage by the type lis enzyme, Bpml (recognition sequence CTGGAGI6/ 
14). The cut site is underlined. 



biotinylated lysozyme was used as ligand on the Streptavidin 
chip (flow cells 1,2,3), and our biotinylated myoglobin was 
used as the negative control (non-specific binding control) 
ligand on the same chip (flow cell 4). Approximately 4000 
RUs of both ligands were bound to the chip, under which 
conditions specific binding could be demonstrated. 
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RESULTS 

Inserting a defined CDR3 into GFP 
to confer binding activity 

The HCDR3 from a VHH recognizing lysozyme has been 
transplanted to neocarzinostatin, a bacterial chromoprotein 
with a beta sheet structure, with the chimeric molecule recog- 
nizing lysozyme with an affinity of 500 nM (44). We attemp- 
ted to replicate this finding, by transferring the same HCDR3 
to two surface exposed loops in 'Superfolder' GFP (sfGFP) 
(50), a GFP mutant selected to be resistant to the destabilizing 
effects of poorly folding proteins fused to its N-terminus, and 
hence more stable than other forms of GFP. In order to effect- 
ively use HCDR3s as diversity elements, both structural and 
sequence conservation must exist at the N- and C-terminal 
ends of the isolated HCDR3. Structural conservation is 
required to ensure that once a permissive site has been cho- 
sen, different HCDR3s can be inserted equally effectively 
at the same site, while sequence conservation is required to 
allow effective cloning of the isolated HCDR3s. Within the 
four N-terminal and six C-terminal amino acids from differ- 
ent HCDR3 regions found to be structurally similar by Morea 
et al. (37), the DNA sequences encoding the N~terminal 
cysteine and C-terminal tryptophan and glycine are extremely 
conserved. As the cysteine 104 [IMGT numbering (54)] usu- 
ally forms a double hydrogen bond with the glycine 1 19 (37), 
these two amino acids were chosen to be the limits of the 
cloned HCDR3. However, to avoid the presence of an 
unpaired cysteine (the HCDR3 N-tenninal cysteine normally 
disulfide bonds with another cysteine in framework one), this 
codon was mutated to a serine. This is identical to cysteine, 
except for the replacement of sulfur by oxygen, and so is 
able to form the same hydrogen bonds. In order to create 
recipient GFPs which could be used for cloning HCDR3 lib- 
raries, as well the specific anti-lysozyme HCDR3, we inserted 
(see Figure 1 and below) a SacB gene at each targeted inser- 
tion site flanked by Bpml sites. The SacB gene is a negative 
selector able to reduce vector background by 10 5 -fold by plat- 
ing bacteria on sucrose after transformation (49,55). Bpml is 
a type lis restriction site which cleaves 14/16 bp away from 
its recognition site. The cleavage sites were designed to 
include conserved 5'- and 3'-HCDR3 sequences, which 
were exposed after digestion by Bpml, allowing the recon- 
struction of full-length HCDR3s within the GFP from either 
annealed oligonucleotides or amplified PCR fragments. 

The anti-lysozyme HCDR3 described above was synthes- 
ized as a pair of overlapping phosphorylated oligonucleotides. 
These were annealed and ligated directly into two of the Bpml 
cut vectors (Figure 1). Loops 1 and 3 (see Table 2 for nomen- 
clature) were both independently targeted. Both clones, named 
c-lysl and c-lys3, depending upon the loop insertion site, 
yielded fluorescent proteins. These were expressed in BL21 
using either 100 \iM IPTG or autoinduction media (53), and 
subsequently purified by immobilized metal affinity chroma- 
tography using the C-terminal His6 tag. 

Binding between GFP containing this HCDR3 was demon- 
strated in an enzyme-linked immunosorbent assay (ELISA) 
format in which the lysozyme was biotinylated and interacted 
with the modified GFP prior to capture on a neutravidin coated 
plate (Figure 2 A). Specific binding could also be demonstrated 
using a flow cytometric bead based method, in which detection 
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was carried out by measuring the fluorescence of streptavidin 
coated beads to which biotinylated lysozyme and GFP contain- 
ing the anti-Iys HCDR3 had bound (Figure 2B). Unlike ELISA, 
this method relies on the intrinsic fluorescence of the binder, 
demonstrating that binding activity and fluorescence reside in 
the same protein, and that, at least in this case, HCDR3 insertion 
has not disrupted GFP function. This technique can also be used 
to determine affinity (52) by incubating microspheres coated 
with antigen with increasing concentrations of fluorescent bin- 
der. As concentration increases, the bead bound fluorescence 
reaches a plateau, as all target sites on the microspheres are 
bound. By subtracting background binding and determining 
the concentration of fluorescent binder at which half maximum 
fluorescence is obtained, we were able to estimate the affinity of 
this interaction to be 1.34 |iM for c-lysl (Figure 2C), similar to 
the estimate, obtained by isothermal calorimetry, for neocar- 
zinostatin containing the same HCDR3 (44). Binding was 
also examined using surface plasmon resonance. Although, 
specific binding could be demonstrated when chips were den- 
sely coated with lysozyme, similar to the neocarzinostatin res- 
ults (44), affinity was not high enough to show binding when the 
lower levels of coating required to determine affinity were used 
(data not shown). 

These results indicate that the orientation and structure of 
the HCDR3 is maintained at both insertion sites and is similar 



to that in the original VHH, suggesting that HCDR3s are a 
valid potential source of diversity, and may alone provide 
sufficient binding energy to yield micromolar binders. 

Analyzing human HCDR3 flanking sequences 

In order to determine the best way to clone HCDR3s, 5669 
human heavy chain variable genes were downloaded from 
the IMGT web site (56), using 'human heavy chain variable 
genes of any specificity' as search criteria. These were pared 
down to 5646 full-length VH genes representing a wide spec- 
trum of different V genes, and encompass the full range of 
mutations found at all different sites within the V genes, 
including potential primer sites flanking the CDR3. Of 
these 5646 VH genes, 4842 can be accounted for by search- 
ing for the following motifs (all based on the 10 bases finish- 
ing with the extremely conserved cysteine and the base which 
follows it — TGT G) found at the 3' end of framework region 
3 just before the CDR3 (number of times found): 

TATTACTGTG (4061) 

TATTATTGTG (462) 

TATTTCTGTG (319) 

As amplification usually requires more than 10 bases of 
homology, the 4842 sequences described above were extrac- 
ted from the database and analyzed for homology upstream of 



A: ELISA based binding to lysozyme 



B: Analysis of lysozyme binding by flow cytometry 
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Figure 2. (A) ELISA was carried out by first interacting biotinylated antigen with the GFP clone of interest and then capturing on a neutravidin coated plate. GFP 
binding was revealed using SV5, a monoclonal antibody (73), which recognizes a tag appended to the C-terminus. The non-specific clone was GFP containing 
the myc epitope (74) at the loop 3 position, and indicates the level of binding due to a similarly disrupted GFP molecule, clyslpl and clyslp3 have the lysozyme 
binding HCDR3 inserted into loop 1 and loop 3, respectively. (B) Streptavidin coated beads were incubated with either biotinylated lysozyme or myoglobin, 
(which serves as the negative control for non-specific binding to coupled beads) and subsequently with GFP or GFP containing the anti-lysozyme HCDR3 
inserted at loop 1 (c-lysl). Analysis was carried out using a FACSCalibur. (C) As in (B), except that different concentrations of c-lysl were used and the mean 
fluorescence of the non-specific (myoglobin) beads was subtracted from the fluorescence of the lysozyme coated beads and plotted. The affinity is calculated 
from the concentration of c-lysl, which gives half maximal fluorescence. 
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these 10 bp sequences. Based on the homology found, the 
sequences described in Table 3 were designed. A similar pro- 
cedure was carried out for the 3' end of the HCDR3 in the JH 
gene sequences. However, in this case, the sequences were 
centered around the highly conserved CTGGGGCC sequence 
found in all JH genes. The alignment of these sequences with 
germline V and J genes is given in Table 4. 

Seven of the thirteen primer sequences were degenerate, 
with each component sequence recognizing V genes. 
Table 3 shows the numbers of the 5646 rearranged and germ- 
line VH genes recognized by each of these sequences, assum- 
ing 100% homology. In experimental use, it is likely that 
many more genes will be recognized by each individual 
primer, since single mismatches do not usually prevent 
PCR amplification, especially when found upstream of nine 
homologous bases. 

Cloning design 

The strategy used to clone HCDR3s amplified from B cells 
into GFP is described in Figure la and detailed in 
Figure lb. The GFP recipient vectors contain the SacB 
gene flanked by the highly conserved 5' and 3' portion of 
the HCDR3. GFP containing SacB at the different insert 
sites is non-fluorescent, and the only way fluorescence can be 
restored is by removal and replacement of the sacB negative 
selector with a sequence that encodes a peptide permissive 
for GFP folding at the targeted insertion site. In order to 
isolate these sequences, without flanking framework 
sequences, and to recreate full-length HCDR3s, a type lis 
restriction site, Bpml, was used. This cuts 16/14 bases 
away from its recognition site, allowing it to be placed 
upstream of an amplifying oligonucleotide in such a way 
that the majority of the oligonucleotide sequence can be 
removed after PCR, leaving a 2 bp 3' extension for ligation 
(see Figure lb). Based on the sequences described in 
Tables 4 and 5, the Bpml site in the 5' HCDR3 primer was 
placed to cleave across the conserved cysteine and adjacent 
codon (TGTG) (Tables 4 and 5), while at the 3' end it was 
placed to cleave within the conserved tryptophan codon 
(TGGGGC). Overhanging bases are underlined in both 
cases. Altogether, eight vectors containing SacB insertions 
at eight different sites, comprising five different loops, were 
created (Table 2 and below). 

In order to eliminate the primer ends removed by Bpml 
cleavage (corresponding to the framework sequences), the 
oligonucleotides were biotinylated at their 5' ends (Table 5), 
allowing their removal with streptavidin Dynabeads. 

PCR amplification and cloning 

Non-biotinylated primers were tested by amplifying HCDR3s 
from the peripheral blood lymphoctye cDNA of 40 donors 
using non-biotinylated primers. Using each individual primer 
with a pool of the complementary primers yielded a 75-150 
bp smear for all primers (Figure 3A and B), which is more 
visible for the primers recognizing more rearranged V 
genes (e.g. VR-35-36.4). When the length of the primers is 
taken into account, this corresponds to HCDR3s ranging 
from ~20-95 bp, similar to the previously published range 
of HCDR3 lengths (36) (24-90 bp and Figure 5). 



Biotinylated versions of the primers were created and 
pooled for amplification before purification and cloning. 
Unfortunately, primer JH4.4-3 had to be omitted, due to the 
sub-optimal quality of the biotinylated primer. Lane 1 in 
Figure 3C shows the HCDR3 smear prior to digestion with 
Bpml, ranging from 75 to 150 bp (arrow A). After digestion 
(lane 2), the smear is reduced in size by 55 bp (range 25-95 bp, 
arrow B), corresponding to the primer portions removed by 
Bpml, which can be seen as an additional sharp lower band 
(arrow C). Lane 3 shows the final purified HCDR3 preparation 
after removal of the biotinylated primers using magnetic strep- 
tavidin beads (Dynabeads). The extent and intensity of the 
smear is essentially identical to that in lane 2, except that 
the lower band is eliminated, indicating the efficiency of the 
use of streptavidin to remove the biotinylated external primer 
portions. 

HCDR3 libraries 

In order to assess the effects of the insertion of many different 
HCDR3s on protein folding in general, and GFP folding and 
function in particular, five different loops in GFP were tar- 
geted for the insertion of HCDR3 libraries (Table 2). The 
insertion sites were identified by an examination of the struc- 
ture of GFP (57), with the goal of placing the HCDR3s at the 
tips of the loops, and so hopefully continue the GFP beta 
strand structure into the first part of the HCDR3. In addition, 
one alternative site, differing by a single amino acid, and 
consequently slightly off the tip center, was also targeted 
for three of the loops (1,2 and 3 in Table 2), to see whether 
insertion within loops had to be precisely localized, or 
whether it was sufficient to target a loop, without concern 
for the exact insertion sites. 

The HCDR3 fragments were gel purified and cloned into the 
eight recipient GFP vectors. The libraries were then induced 
and analyzed by flow cytometry. Each of the libraries showed 
significant numbers of fluorescent bacteria (Figure 4) when 
induced with IPTG, with loops 1, la, 3, 3a and 5 providing 
the greatest number, many of which overlapped with the fluor- 
escence of bacteria expressing GFP. The mean fluorescence 
for these libraries ranged from 10711-25 260, while GFP 
had a mean fluorescence of 23 274. The remaining loops (2, 
2a, 4) were less fluorescent (941-2295), although still signific- 
antly more fluorescent than BL21 (mean 61). All the fluores- 
cent profiles for bacteria containing GFP with inserts were 
broader than GFP with a slightly longer tail on the low fluor- 
escence portion of the curve. The differences between libraries 
inserted at different loops sites differed by as much as 27-fold 
(loop 2a compared to loop I), while the greatest difference 
between insertions in the same loop differing by a single 
amino acid were in no case greater than 2.5. This suggests 
that the primary determinant of fluorescence is the targeted 
loop, with the site within the loop playing a lesser role. 

Sequence analysis of HCDR3s 

302 random clones containing inserts were sequenced to fur- 
ther analyze the nature of the cloned diversity. The length dis- 
tributions of the HCDR3s (Figure 5) showed a slight increase 
in shorter HCDR3s (14-16 amino acid), and a reduction in 
longer ones (>21 amino acid) compared to the HCDR3 length 
distributions reported in the literature (36). All inserts were in 
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Figure 3. PCR amplification of lymphocyte cDNA using a mixture of the 3' J region primers and individual 5' V gene primers (a), or a mixture of the 5' V gene 
primers and the individual J region primers (b). In (c), the pooled PCR product prior to digestion is shown in lane 1. In lane 2 the PCR product after digestion 
with Bpml, and in lane 3 the product after the biotinylated primers are removed. In each case the arrows show the amplified HCDR3s except C in 3c. 



frame with GFP, and nucleotide blast searches showed homo- 
logy to HCDR3s (Figure 6a shows representative sequences). 
Only 1.3% of the sequences contained stop codons, and 4.8% 

lacked the characteristic C(S) WG HCDR3 sequence. 

In addition there were 18 sequences repeated more than once, 
corresponding to 5.9% of all sequences. Interestingly, two of 
these duplicated HCDR3 sequences were each found in three 
different loops (Figure 6b) indicating duplication was present 
in the source cDNA, rather than a cloning artifact. This has 
been previously observed in sequencing human peripheral 
B cell V genes (58), and reflects different VH gene pools 
with different representations, such as those derived from 
recent infections, rather than the saturation of diversity. 

A number of clones from each library were picked at 
random and analyzed for their fluorescence, and correlated 
with the length of the HCDR3 insert. Figure 6c shows the 
fluorescence of individual clones expressed as a percentage 
of GFP fluorescence, correlated with the length of the inser- 
ted HCDR3. As can be seen, the spread of HCDR3s ranges 
from 40 to 75 bp, with little correlation between length 
and fluorescence, except beyond 80 bp, where fluorescence 
is reduced. This indicates that providing the HCDR3 length 
is within the normal range (40-75 bp), larger HCDR3s do 
not reduce fluorescence more frequently than smaller 
HCDR3s. 



In order to determine whether there were any differences 
between HCDR3s found in strongly fluorescent clones com- 
pared to weakly fluorescent ones, bacteria containing the 
libraries were flow sorted for fluorescence, and an additional 
434 sequences analyzed. Although the length distributions of 
the HCDR3s in the unsorted and sotted libraries were 
extremely similar (see Figure 5), the percentages of repeated 
sequences (1.6%), non-characteristic HCDR3 sequences (2%) 
and sequences containing stop codons (0.7%) were all 
reduced. 

Examination of protein properties 

A number of fluorescent clones containing HCDR3 inserts 
were expressed and purified to study their properties com- 
pared to GFP. The expected size differences between GFP 
and clones containing inserts were apparent for all clones 
(Figure 7a). The expression levels of the different clones 
ranges from 5.2-138 mg/1, 1.3-30% the level of GFP. 
Normalized absorption/emission spectra were essentially 
identical to GFP (data not shown). 

The stability of some of these proteins was studied using 
a real time PCR machine in which protein fluorescence was 
monitored as temperature was gradually raised (0. l°C/s) 
(59). Figure 7d shows fluorescence levels for GFP and a 
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Figure 4. Flow cytometric analysis of bacteria containing GFP with HCDR3 libraries at different insertion positions. In each panel the grey line represents 
BL21 bacteria, and the green line BL21 expressing GFP. Each panel represents libraries inserted at a different loop, with different positions indicated for loops 
I, 2 and 3. The mean fluorescence for each of the populations, calculated from the plots, is given in the table. 



number of different clones, scaled to start at the same fluor- 
escent level. All proteins, with the exception of 2-G2, showed 
two phase melting curves with an initial slow phase for 
15-30°C after 50°C, followed by a sharp transition to com- 
plete melting. The midpoint of the cooperative transition 
for GFP was 83.5°C, while clones containing HCDR3 inserts 
ranged from ~71-83.5°C, with two clones (3a-Al2 and 
3a-C4) slightly more stable than GFP. By 88.5°C all clones 
had completely lost fluorescence. This melting pattern is sim- 
ilar to that shown by the engineered ankyrins, with a less 
steep first phase than that seen with the ankyrins (60). 
Although not examined in detail, the proteins with the 
lower thermal stabilities tended to come from the less fluor- 
escent libraries (loops 2, 2a and 4). 



DISCUSSION 

Antibody CDRs, or hypervariable regions, are the portions of 
the variable regions, which constitute the antigen binding 
loops. Of the six CDRs, the HCDR3 is the most diverse, 
reflecting its complex genetic origin (Figure l). It also 
plays the most important role in antigen recognition, as 
shown by the isolation of pairs of antibodies recognizing 
different antigens which differ only in their HCDR3s 
(41,42). This has not been shown for any of the other 
CDRs, indicating that HCDR3s alone, within the context of 
an antibody variable domain, are able to provide sufficient 



binding affinity to discriminate between different antigens. 
Furthermore, it has recently been shown that some antibodies 
are able to bind distinctly different antigens as a result of con- 
formational flexibility, in which the HCDR3 undergoes large 
structural changes when binding (38), a property likely to be 
far more widespread than hitherto expected. Coupled with the 
known length (36), chemical (36) and structural (37) diversity 
of HCDR3s, as well as the relative lack of stop codons, these 
results indicate that HCDR3s may serve as useful sources of 
diversity if ways could be found to transplant them from anti- 
bodies to alternative scaffolds with different properties. In 
this regard, a tissue plasminogen activator (TPA) able to 
recognize integrins avP3 and oc n bp3 was created by inserting 
the HCDR3 from a recombinant anti-integrin antibody (61) 
into a TPA loop flanked by a beta sheet structure (43), and 
a functional neocarzinostatin derivative able to recognize 
lysozyme was similarly created by grafting an HCDR3 
from a camelid anti-lysozyme VHH to a surface loop (44). 
In the first experiments reported here, we were also able to 
transfer this VHH CDR3 to two different GFP loops and con- 
fer lysozyme-binding with an affinity comparable to that 
obtained when transferred into neocarzinostatin, indicating 
the generality of transferring HCDR3s with specific binding 
properties to alternative scaffolds. 

The main problem with isolating libraries of HCDR3s, 
rather than specific ones, is the fact that their diversity is 
embedded within relatively conserved structured beta sheet 
framework regions that form extensive contacts with other 
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Figure 6. (a) Shows the sequence of random HCDR3s cloned into the different loops. In (b) are shown two HCDR3 sequences whose sequences were found in 
different loops, (c) Shows the correlation between HCDR3 length and bacterial fluorescence as expressed as a percentage of the fluorescence of bacteria 
expressing GFP. 



VH region amino acids. In this paper we overcome this 
problem with a PCR based method that uses flanking type 
lis restriction sites to remove the framework regions after 
amplification. This relies on the structural, DNA and amino 
acid sequence conservation found at either end of human 
HCDR3s: cysteine 104 and tryptophan 119 (IMGT number- 
ing) are essentially 100% conserved at the amino acid and 



nucleotide levels. Structurally these amino acids are joined 
by two hydrogen bonds and so very close to one another, pro- 
viding further justification for their use as diversity elements. 
This allowed us to design 13 primers annealing within the 
flanking framework regions which were able to amplify a 
large percentage of rearranged VH gene CDR3s. By adding 
Bpml sites at the ends of the primers, these framework 
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Figure 7. (a) Shows a polyacrylamide gel of different purified clones containing HCDR3 inserts at different positions, (b) Shows the normalized fluorescent 
emission of different clones gradually heated up at the rate of 0.1 c C/second. For each clone, the first figure indicates the loop insertion site. 



regions could be removed, leaving two base pair overhangs 
residing within the conserved amino acids at either end of 
the HCDR3. These overhangs were ligated to sequences 
encoding the remaining portion of the HCDR3 (with the 
cysteine exchanged for serine), exposed when the negative 
selector (SacB) gene was removed from GFP by digestion 
with Bpml (Figure 1). This arrangement facilitated the clon- 
ing of a diverse set of HCDR3s, independently of knowledge 
of their sequences. 

By inserting HCDR3 libraries into the tips of five loops on 
one end of GFP, we were able to assess the degree of disturb- 
ance these HCDR3s caused to folding, since GFP must fold 
correctly to become fluorescent (47). Of the eight sites 
examined, five were very permissive, giving mean bacterial 
fluorescence profiles within 2.3-fold of GFP. The remaining 
three sites, although up to 25-fold less fluorescent than GFP, 
were nevertheless significantly fluorescent. The mean fluores- 
cence per cell is a combination of the number of fluorescent 
proteins per cell and the intrinsic fluorescence of those 



fluorescent proteins. In Figure 7a, the amount of each fluores- 
cent protein loaded on the polyacrylamide gel was normalized 
for fluorescence. The similar intensity of the coomassie blue 
staining suggests that the greatest variability between different 
bacterial clones is in the amount of expressed fluorescent 
protein, rather than any difference in intrinsic fluorescence. 
This is supported by the differing levels of protein which 
can be purified from each clone. 

For three of the sites (loops 1, 2 and 3) two insertion sites 
were tested. These differed by a single amino acid, and the 
rationale was to determine whether it was sufficient to 
place the HCDR3 within a permissive loop, or whether the 
exact site within that loop was critical. Although there was 
a small (<2. 5-fold) difference in the fluorescence at the two 
sites for each loop, this was far less than the difference 
observed between different loops (up to 27-fold), suggesting, 
at least within this small set, that the loop targeted is more 
important than the precise position within the loop. We did 
not examine insertion at other sites, and it is possible that 
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insertion of HCDR3s into secondary structures which are not 
loops, or perhaps at the loop extremities, may be significantly 
more disruptive, especially since loops tend to be less well 
conserved than more structured elements. 

We sequenced over 300 cloned HCDR3s. Over 90% had 
the characteristic HCDR3 sequence: C(S)XX . . . XXWG. 
The sequence characteristics of HCDR3s more permissive 
for GFP fluorescence were identified by flow sorting bacteria 
expressing GFP containing HCDR3 libraries in different 
loops. We found that HCDR3s in the more fluorescent clones 
were less likely to contain stop codons, non-characteristic or 
repeated HCDR3 sequences. There was no bias in favor of 
either bulged or non-bulged HCDR3s (37), with the propor- 
tion remaining constant before and after sorting. As both 
non -characteristic and repeated HCDR3s are more likely to 
be derived from heavily mutated clones, this may explain 
their detrimental effect on GFP fluorescence. 

GFP is known to be an extremely stable protein. The GFP 
we used, superf older (50), was selected to be particularly 
resistant to the effects of poorly folding proteins fused to 
its N-terminus. This also confers improved stability to the 
protein generally (50). This was confirmed in our thermal sta- 
bility studies, in which the cooperative transition midpoint 
was 83.5°C, not dissimilar to GFP containing loops which 
ranged from 71 to 83.5°C. This minimal disturbance likely 
reflects both the stability of GFP as well as the relatively 
non -perturbing nature of the HCDR3 inserts, which was 
also shown by the fact that the spectral properties of these 
proteins were essentially identical to GFP. 

This study was carried out with human HCDR3s because 
of the great deal of sequencing information available. This 
made the design of appropriate framework primers relatively 
straightforward. However, with sufficient information on the 
sequences of appropriate flanking regions, this approach 
could be applied to the harvesting of diversity from the anti- 
body genes of other species. This may be of more than aca- 
demic interest, since the means by which antibody diversity 
has evolved in different species, although sharing some com- 
monalities, has tended to be species-specific (62,63). As a res- 
ult, different CDRs have different properties, lengths and 
amino acid distributions in different species, with tendencies 
to bind to different classes of antigens. Cow (64) and drom- 
edary heavy chain genes (65,66), e.g. have far longer 
HCDR3s than humans. In camels, these have been shown 
to be important in the mediation of enzyme inhibition by dir- 
ect insertion into active sites (67,68), as well the recognition 
of conserved cryptic epitopes of infectious agents, perhaps by 
penetration into conserved receptor binding sites (69). 
Although less dramatically different, murine HCDR3s tend 
to be shorter than human HCDR3s, with different amino 
acid compositions (36), resulting in more HCDR3s which 
have stabilized hydrogen bond ladder structures, as opposed 
to human HCDR3s which contain more prolines, preventing 
the formation of such ladders. 

The method we describe here can also be adapted to other 
antibody CDRs, as well as to other immunological proteins, 
such as T cell receptors, which share similar primary struc- 
tures: variable regions flanked by relatively conserved frame- 
work regions, y and 8 TCRs have CDR3s which resemble 
immunoglobulin heavy chain CDR3s in their length variabil- 
ity, while a and P TCRs have CDR3s, which are extremely 



homogenous in length (8-9 amino acids), in common with 
the other antibody CDRs (70). The common component of 
all such diversity elements is that they have evolved to 
bind, and so are likely to be more functional than random 
peptides, which generally contain more stop codons and are 
more likely to contain destabilizing inserts. 

An alternative to the use of completely random amino 
acids, has been the use of restricted amino acid sets in the 
generation of antibody libraries (26,27,71,72). In these 
experiments it has been shown that different amino acid 
diversities at specific sites significantly affect the successful 
outcome of selection experiments, and in one of the most sur- 
prising results, specific high affinity antibodies can be selec- 
ted from libraries in which heavy chain diversity is limited to 
only two amino acids (27). These careful studies of the roles 
of different amino acids in functional diversity, are similar to 
those which nature has been conducting over evolutionary 
time in the different molecules involved in immune recogni- 
tion in many different species. The method described here 
enables the harvesting of such diversity for transplantation 
into heterologous proteins, setting the stage for the explora- 
tion of the use of libraries containing such sequences for 
selection experiments. 
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